Method of providing selected content items to a user

ABSTRACT

A method for providing selected content items to a user. The selection of content items is based on metadata pre-assigned to content items, typically authored content metadata, and on metadata generated and associated afterwards, called derived content metadata. Additionally, the selection of content items can be based also on context metadata, particularly derived context metadata. Derived metadata are automatically generated on the basis of derivation rules corresponding to algorithms to be applied to, e.g., the content of content items, authored content metadata and context metadata. User profiles can be used for improving the selection quality. A method is also disclosed for building and maintaining user profiles based on machine learning techniques.

CROSS REFERENCE TO RELATED APPLICATION

This application is a national phase application based onPCT/EP2005/011580, filed Oct. 28, 2005, the content of which isincorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to a method of providing selected contentitems to a user. The present invention is intended to be used foradvanced content-based services provided through telecommunicationsmeans such as a data network (e.g. Internet) or a telephone network(e.g. the UMTS network).

The present invention is connected to the fields of content filtering,information retrieval, personalization of services and user profiling.

The present invention is particularly useful when applied to richmultimedia content items, i.e. those comprising different media contentcomponents (text, images, audio, video, . . . ).

BACKGROUND OF THE INVENTION

Every day in the world a lot of information is published and madeavailable to the people through different information media, e.g. press,television, the Internet. The amount of information is rapidlyincreasing.

Unfortunately, the vast amount of information provided by the sources ofinformation currently available can often be overwhelming to anindividual, and that individual may become incapable of, or uninterestedin, sorting through the information for items that he or she finds ofinterest. Therefore, what is needed is a service or ability to provide auser with only that information which the user will find of interest.

It has been common for a long time to filter content items on the basisof “keywords”. Keywords are provided by a user to a softwareapplication; the software application may be local to the user'scomputer or may be running on a remote computer connected to the user'scomputer through e.g. the Internet. The software application returns tothe user all the available content items that are associated to thekeywords specified by the user.

Keywords are a very common type of metadata. In the past, “metadata” wasdefined as “information about information”; as an example, the “title”and the “abstract” of an “article” is metadata as they provideinformation on the content of the article that is the informationitself.

More sophisticated methods have been developed for the purpose offiltering content information. These methods are based on the use ofmetadata for indexing the various content items, particularly “authoredmetadata”, i.e. metadata associated to a content item by the author ofthe content item on by another person.

Effective filtering of content items often requires knowledge about theuser, e.g. user habits and/or user preferences.

An interesting overview on filtering documents through the use ofmetadata and user profiles can by found in the article by Erika Savia etal., “Metadata Based Matching of Documents and User Profiles”, in Proc.8th Finnish Artificial Intelligence Conference, Human and ArtificialInformation Processing, pages 61-69, 1998.

From international patent application WO 02/41579, it is known a methodfor grouping and transmitting multimedia data. Multimedia data areanalysed in terms of their content, corresponding metadata are extractedby a metadata extraction module and a user profile is prepared. Prior toreceiving multimedia data from a central unit, the user sets at leastparts of the user data of the user profile, by means of a communicationdevice, and/or modifies the same. Multimedia data are selected by meansof the metadata and on the basis of the user profile, andcontent-oriented multimedia data optimised in a user-specific manner isproduced from the selected multimedia data by means of a repackagingmodule. Said content-oriented multimedia data optimised in auser-specific manner is stored in a data base of a content module of thecentral unit and provided to the user.

According to this international patent application (page 8, lines 6-14),the metadata are retrieved on the basis of a content-based indexingtechnology such as one of those described e.g. in U.S. Pat. Nos.5,210,868 and No. 5,414,644.

SUMMARY OF THE INVENTION

The present invention relates to a method of providing content items toa user, in particular to provide selected content items by taking intoaccount user preferences.

The basic idea behind the present invention is to automatically generatemetadata and to use these generated metadata for selecting the contentitems to be provided.

The Applicant has realized that manual generation of metadata, i.e.typically authored metadata, is an enormously time-consuming activity(it will be impossible to keep up with the ever increasing amount ofpublished information), is error prone and it is generally not serviceoriented. Therefore, it is practically very difficult to reach a levelof accuracy adapted to filter enormous amounts of published information.

Applicant has realized that it is important to have accurate userprofiles and to be able to build them and to keep them updatedautomatically.

Additionally, Applicant has noticed that the interaction context whenproviding content can be advantageously used for building or updatinguser profiles.

The method according to the invention can be provided by a serviceprovider that offers a service of delivering personalized content tousers. The above considerations apply when the content items areprovided both in PULL mode and in PUSH mode.

According to the present invention, content and preferably also contextmetadata are generated automatically. Derived metadata are automaticallygenerated on the basis of derivation rules corresponding to algorithmsto be applied to e.g. the content of content items, authored contentmetadata and raw context metadata.

The above features provide flexibility and power to the selection methodaccording to the present invention.

The derived metadata, in addition to explicit and/or implicit userfeedback, can advantageously be used also to build and maintain userprofiles. In this way, the user profile is accurately built and can beaccurately maintained over time. Preferably, building and maintaining(i.e., updating) user profiles is carried out through the use of machinelearning techniques.

The present invention comprises a first aspect related to processing ofcontent items. Content processing basically corresponds to appropriateselections of content items. In the preferred embodiments, the presentinvention relates also to a second aspect related to user profilesprocessing, which basically corresponds to appropriately building andmaintaining user profiles.

These two aspects are linked one another as an accurate contentselection can be carried out on the basis of a user profile matching,i.e., content items are matched against a user profile. In order tobuild and maintain a user profile user's feedback (explicit and/orimplicit) on the selected content is advantageously used.

According to a first aspect of the present invention, when a request ofcontent from a user is received, the following steps are performed:

-   -   a query is generated starting from the above request;    -   based on this query a first set of content items are identified;    -   pre-assigned content metadata (typically authored) that can be        associated to each content item of this first set are        identified;    -   preferably, raw context metadata that represent the context        information associated to the above request are identified, and    -   derived metadata are generated automatically for each content        item on the basis of derivation rules corresponding to        algorithms to be applied to the content items and preferably        also to the pre-assigned content metadata (if any) associated to        said content items. More preferably, algorithms are applied also        to raw context metadata.

Preferably, after generation of the derived metadata, the first set ofcontent items is stored in a content item repository. More preferably,also the derived metadata generated following a request of a user andassociated to the content items are also stored in the content itemrepository for future use so as to avoid repeating the process ofmetadata derivation.

When derived metadata associated to each content item of said first sethave been generated, a second set of selected content items is providedto the user, said second set being contained within the first set ofcontent items. According to a preferred embodiment of the presentinvention, the second set of content items is provided by performing thefollowing steps:

-   -   (derived and pre-assigned) content metadata are identified for        each content item;    -   preferably, context (derived and raw) metadata are identified        for each content item;    -   a user profile of the user that generated the request is        identified,    -   the content items of said first set is matched against the user        profile based on at least some of the derived metadata and a        ranking is produced for the content items of this set;    -   based on said ranking, the second set of content items is        provided to the user, said second set corresponding to the first        set ordered on the basis of said ranking or (preferably)        corresponding to a sub-set of content items containing the best        ranked content items, and    -   preferably, the user feedback (explicit and/or implicit) on the        provided content items is collected.

Each content item belonging to said second set to which a user feedbackhas been associated corresponds to an interaction event, which ispreferably stored as a record in an interaction history repository.Thus, according to a preferred embodiment of the present invention, inorder to update the profile of a user the following steps are performed:

-   -   retrieving a plurality of interaction history records relating        to the user generating the request, wherein each record includes        a content item and at least a user feedback (explicit and/or        implicit), generally represented by a user vote, associated to        said content item. Preferably, records include also user        requests, raw context metadata and content metadata, derived and        pre-assigned, associated to the content items;    -   selecting a machine learning algorithm for building a predictive        model of this user;    -   coding each record (i.e., each stored interaction event) as a        feature vector, which is a formal representation adapted to be        used with the selected machine learning algorithm. Said feature        vector comprises a plurality of elements corresponding to the        metadata, derived and (if any) pre-assigned, associated to a        specific content item and comprises a user feedback. If derived        metadata are not present in the record, they can be retrieved        from the content item repository that stores all the content        items of the first set and in general the content items selected        following a query;    -   applying the selected machine learning algorithm to said feature        vectors, each vector corresponding to an interaction event, thus        building a predictive model (user model);    -   (more preferably) validating the built predictive model, and    -   updating the profile of the above user by substituting the old        predictive model with the new predictive model.

Therefore, according to a preferred aspect of the invention, the secondset of content items is provided by applying machine learning methods tothe derived metadata and preferably to the pre-assigned metadata bytaking into account the user feedback in order to define a rankingwithin the first set of content items. More preferably, context metadata(raw and derived) are taken into account as independent features in thefeature vectors to which the machine learning methods are applied. Ifderived context metadata are not present in a record of the interactionhistory repository, they can be derived from raw context metadataon-the-fly as metadata derivation of raw context metadata is generallynot computationally heavy.

The present invention will become more apparent from the followingdescription to be considered in conjunction with the annexed drawings inwhich:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a block diagram of a system implementing an embodiment ofthe method according to the present invention,

FIG. 2 shows a flow chart of the main steps in processing content itemsaccording to the present invention,

FIG. 3 shows a flow chart of the main steps in processing user profilesaccording to the present invention, and

FIG. 4 shows schematically a data structure adapted for storing contentitems and the associated metadata and that can be used for the presentinvention.

DETAILED DESCRIPTION OF THE INVENTION

Before describing the present invention in detail, some terminologydefinitions and descriptions are provided in the following.

Content-Based service

In this description, a content-based service is any software applicationthat leverages a set of existing content items to build informationcontent that can be of a value for the subscribers of the service. Theway the selected contents are aggregated and presented to the user isdefined by a service application logic.

The user may interact with a content-based service through a servicefront-end in two modes: PULL or PUSH; a content-based service mayprovide one or both modes. In the PULL mode, the user initiates theinteraction by directly accessing the service front-end, possiblyproviding specific inputs, to immediately obtain the desired content. Inthe PUSH mode, at the time when the user subscribes to the content-basedservice (and possibly in subsequent times), the user can provide inputsthat can later generate a service content. Based on those inputs, whensuch a content is generated the user is notified to access the servicefront-end to obtain the content.

Content Item

Within this technical field, content items are the basic units of a usercontent interaction for a content-based service. The content-basedservice provides content items generally as a reply to a request of aservice subscriber, i.e. the user. A content item is what the userperceives as a single entity delivered by the service. However, acontent item may comprise one or more content components. For example,if the content item is the video of a soccer match, the item cancomprise the two match halves as content components.

Examples of content items are:

-   -   a movie or TV program, from e.g. an on-demand media delivery        environment;    -   a news article, from e.g. an online newsreader;    -   a single Web URL result, from e.g. a Web search engine;    -   a song, from e.g. a content-sharing network environment;    -   a picture, from e.g. an online media catalogue;    -   a Web page, from e.g. Internet navigation;    -   a product page, from e.g. an e-commerce catalogue.

In general, a content item is a structured object comprising one or morecontent components.

Each content component is a multimedia element and may be e.g. text,image, audio, video, three-dimension model, vector graphics, graphicallayout.

Examples of text component are: the text of an online newspaper article,the text portion of a news article, the text contained in a Web page,the text description of a product in an e-commerce catalogue. Examplesof image component are: the pictures and drawings contained in a Webpage, the photos contained in a news page, the pictures contained in anonline media catalogue. Examples of audio component are: the audio filecontaining a song in an on-demand media delivery environment, the audiofile containing a song in a content-sharing environment, the audio trackin a movie, the audio track of a news article. Examples of videocomponent are: the file containing a movie or TV program in an on-demandmedia delivery environment, the file containing a video in acontent-sharing environment, the video portion of a news page. Anexample of three-dimension model component is the 3D model representinga piece of furniture on an online e-commerce catalogue. Example ofVector graphics component are: a Flash animation in a Web page, an SVG[Scalable Vector Graphics] document. An example of graphical layout isthe graphical layout of a Web page.

An example of a multi-component content item is e.g. a news itemrelating to a piece of news and consisting of a text component (i.e. abrief text describing the piece of news), an audio-video component (i.e.an audio-video sequence describing and showing the piece of news), andan audio component (i.e. an audio sequence describing the piece ofnews).

Content Metadata

In the past, “metadata” was defined simply as “information aboutinformation”.

More specifically (and according to the W3 [“www”] Consortium), acontent metadata consists in general of data structures that describe agiven content item and can be processed automatically by a machine, e.g.a computer.

Content metadata can describe each content component belonging to acontent item or the content item as a whole.

Content items that are made available by content providers are oftenprovided with metadata, such as an identification indexing (e.g.,subject, field,...). Metadata which are associated to the content itemswhen the items are available to the content-based service are referredto as pre-assigned metadata. Pre-assigned metadata are typically“authored metadata”, which is a type of content metadata associated to acontent item by the author of the content item on by another person,typically within a content provider organization. Authored metadata isgenerally manually assigned to each content item by means of anannotation process.

There are different kinds of metadata, such as: textual metadata,keyword metadata, categorical metadata (category labels having a valuewithin a limited set of values), numerical metadata. Metadata can have amore complex structure derived from the composition of e.g. the previouskinds (structured metadata) or corresponding to a semantic network(semantic network metadata) according e.g. to RDF [Resource DescriptionFramework].

Examples of textual metadata are: the text description associated to apicture in a Web page (multiple authored metadata may be providedrelating to the content of each picture of the page), the text summaryof the content of a Web page (authored metadata relate to the Web pageas a whole), the song's lyrics associated to a song item. Examples ofkeyword metadata are: a list of keywords describing the topics coveredby a news item, a list of keywords associated to the features of a movieitem (as in the IMDB [Internet Movie Data Base]), a list of keywordsdescribing the main features of the scene and the subjects depicted in apicture item. Examples of categorical metadata are: a “category” labelstating the news category (within a predefined set of news categories)of a news item, a “genre” label stating the song musical genre (within apredefined set of musical genres) of a song item, a “color” labelstating if a movie item is “black and white” or “colour”. Examples ofnumerical metadata are: the integer number corresponding to theproduction year of a movie item, the integer number corresponding to theduration (e.g. in minutes) of a movie item, the currency numbercorresponding to the purchasing price of product item in an e-commercecatalogue.

Structured metadata may apply e.g. to a movie item; in fact, a moviehave a cast that can be represented as a list of actor names along withtheir age, their role in the movie, their gender etc. A typical exampleof structured metadata representation is provided by the MPEG-7description standard.

Context Metadata

The interaction context (for short simply “context”) is an importantelement of a user content interaction for a content-based service. Infact, every user content interaction takes place within a context andthe context often influences the preferences of the user making itemsinteresting within a given context and not interesting in anothercontext.

Interaction context information can also be associated to one or moremetadata, called context metadata.

The interaction context is formed by different aspects. Typically, themost important aspects are: “date and time” (when the interaction takesplace), “user location” (where the interaction takes place),“interaction device” (used by the user for the interaction), “contentchannel” (through which the interaction takes place), “environmentstate” (during the interaction), “physical world state” (during theinteraction), “user state” (during the interaction).

The “user location” can be provided in several ways and in severalforms, for example: spatial coordinates obtained from a GPS system or acellular network, logical coordinates provided e.g. by a short-rangewireless beacon system, transmitting a metadata description of the placewhere the user is located.

The features of the “interaction device” may be, for example: itsmobility (i.e. mobile or fixed device), its graphic capabilities (e.g.size, resolution, number of colours of the display), its soundcapabilities (e.g. number of audio channels), its brand and model.

In some environments, like an on-demand media delivery environment, eachinteraction involves the choice of a “content channel”, like a TVchannel or a movie provider.

The “environment state” may derive e.g. from the setting ofenvironment's options in the interaction device. For example, a mobilephone device can be set into “Meeting”, “Work” or “Home” mode, or itsbehaviour can be set into “Ringing” or “Silent” mode.

Information relating to the “physical world state” may be provided bye.g. sensors detecting temperature, lighting condition, humidity,pressure, wind speed.

Information relating to the “user state” can be provided e.g. by sensorsdetecting acceleration of the user's body (in order to determine if theuser is standing still, walking, running, moving his hands) or some ofhis physiological parameters such as heart rate, blood pressure, skinelectrical conductivity (in order to determine his stress/relaxcondition).

Context metadata derived directly from one or more physical devices arereferred hereafter to as “raw context metadata”. Physical devices can befor example a timer, a sensor, a switch (hardware or software). Oftenthese devices are integrated within the terminal device (e.g. a mobilephone, a personal computer, etc.) including the user interface.

Metadata Representation

In order to simplify the access and handling of all kinds of metadata, auniform and easily extensible format is advantageously used.

The MPEG-7 description standard has been found particularly suitable forthe present invention. In the embodiment that will be described later inthe present description, this standard has been used as a format for allmetadata, both authored and derived. In particular, a content itemmetadata is organized as a list of zero or more “related material”blocks (see FIG. 4), where each related material describes a homogeneousblock of information that can either reference to an actual piece ofcontent or to an XML [Extensible Markup Language] block representing themetadata. A reference always points to a related attachment block thatin turn will either hold the actual content (or metadata) or provide aURL [Uniform Resource Locator] where the content (or metadata) can befound. This organization can accommodate different storage strategieswhile providing a centralized access point to both content componentsand metadata In addition to the information contained in each relatedmaterial, the content metadata can also hold a group of generalinformation that every published content should own, like the creationdate, or other information that is needed by the service applicationlogic, like a live status flag indicating if the content can beconsidered available or not.

The following is a “related material”, encoded according to the MPEG-7standard, reporting some common features of the component.

<RelatedMaterial> <MediaType>Image</MediaType> <MediaInformation><MediaIdentification> <Identifier>image_1</Identifier></MediaIdentification> <MediaProfile Master=“true” id=“image_1_profile”><MediaFormat> <FileFormat>image/pjpeg</FileFormat> <AspectRatioHeight=“322” Width=“122”/> <FileSize>1002</FileSize> </MediaFormat><MediaInstance> <Identifier>image_1</Identifier> <InstanceLocator><MediaURL idref=“attachment_1”/> </InstanceLocator> </MediaInstance></MediaProfile> </MediaInformation> </RelatedMaterial>

The above “related material” refers to the following “relatedattachment”, encoded according to the MPEG-7 standard, for the actualmedia raw data, specifically an image.

<RelatedAttachment id=“attachment_1”> <AttachmentData Storage=“internal”Encoding=“Base64Binary”> <EncodedData>

Base 64 Encoding of the Image not Shown

</EncodedData> </AttachmentData> </RelatedAttachment>

The same approach can be used to introduce derived metadata; a new“related material” is added to the list and, if necessary, a referenceto a new “related attachment” can accommodate a metadata XML block notfitting in this “related attachment” schema.

Although preferred, it is to be understood that the present invention isnot limited to the described metadata representation.

Derived Metadata

Metadata other than pre-assigned metadata and raw context metadata areprovided by the present invention and are called “derived metadata” asthey derive either from content information and/or from contextinformation (i.e., raw context metadata) within an interaction event oras a result of interaction event(s).

In particular, derived content metadata may derive directly from contentitems, i.e. from the content of content items, or indirectly fromcontent items, e.g., from authored content metadata associated tocontent items. Derived content metadata is not directly available when acontent item is published and is generated later by a software program.

Similarly, derived context metadata may derive directly from contextitems, i.e. from the context of an interaction event, or indirectly fromcontext items, i.e. from raw context metadata associated to contextitems. Derived context metadata is not directly available when thecontext of an interaction event is detected and is generated later by asoftware program.

Derived metadata can provide more complete and usable information aboutthe content and the context.

Derived metadata are particularly suitable for being processedautomatically by a software program.

In the following, several examples of metadata derivation are mentioned.

An example of metadata derived from the content of a text item (i.e. thetext itself) is the list of the words occurring in a text together withthe number of occurrences of each word, called the “bag-of-words”representation of the text; such metadata gives information about theoverall lexical composition of the text.

Other examples of metadata derived from the content of a text item (i.e.the text itself) comprises text metrics, i.e. numeric parameterscomputed on a text such as the global length of the text, the averagelength of sentences or paragraphs belonging to the text, the averagenesting depth of the syntactical structure, the Gunning's Fog index(e.g. for an English text) and the Gulpease index (e.g. for an Italiantext).

Metadata derived from the content of an image item (i.e. the imageitself) includes e.g.:

-   -   the luminance histogram, that is the distribution of the        luminous intensity over the pixels of a digital image—gives        information about brightness and contrast over the image;    -   the colour histogram, that is the distribution of primary colour        components (Red, Green, Blue) over the pixels of a digital        image—gives information about the colour composition of the        image;    -   the spatial frequency components of an image, computed for        example via the two-dimension Fourier transform—give information        about the presence of patterns and textures in the image;    -   geometric category metadata generated e.g. geometric hashing        techniques—give information about the presence, in an image, of        shapes such as lines, arcs, ellipses, polygons;    -   pattern category metadata generated e.g. by pattern recognition        algorithms—give information about the presence, in an image, of        specific features, such as human faces, animals, plants,        landscapes, buildings, flags, technical drawings, paintings,        comics;    -   text metadata generated e.g. by optical character recognition        techniques—give information about letters, numbers and words        appearing in an image.

Metadata derived from the content of a sound item (i.e. from the sounditself) includes e.g.:

-   -   the audio frequency spectrum components, computed for example        via the Fast Fourier Transform—give information about the nature        and composition of the sound;    -   audio waveforms—give information about the sound dynamics;    -   pattern category metadata generated e.g. by pattern recognition        algorithms—give information about the presence, in an audio        track, of specific features, such as specific pieces of music,        speech, claps, explosions;    -   text metadata generated e.g. by speech recognition        techniques—extract uttered words or sentences from an audio        track.

Metadata can be derived from the content of a video item (i.e. from thevideo itself) through the use of specific analysis and algorithms.

Scene segmentation analysis techniques can give information about thetime structure of a video. For example, this kind of analysis can tellthat a movie is composed of a given number of scenes, a percentage ofwhich is characterized by a strong motion behaviour and anotherpercentage by the presence of loud music.

Moving object recognition algorithms can give information about thepresence, in a video, of specific objects characterized by specificmotion behaviours, such as walking people, speaking or singing people,running cars, falling objects, opening doors.

If a video is decomposed as a sequence of still images, some of themetadata extraction techniques used for still images can be also appliedto a video, provided that there's a way of averaging the resultingmetadata over the sequence of images.

Metadata derived from a 3D model includes e.g.:

-   -   total area, total volume, convexity, fractal dimension;    -   pattern category metadata generated e.g. by pattern recognition        algorithms—give information about the presence of specific 3D        shapes, such as boxes, pipes, wheels, wires, humanoid shapes,        object shapes.

As already said, in general, metadata can be derived from any othermetadata.

For example, starting from numerical metadata, symbolic ranges may begenerated through the use of discretization techniques that are able togroup numerical values; this is a way of giving a more compact andsemantic representation of numerical values.

Metadata can be also derived through the use of ontologies. An ontologyis the formalization of a conceptualization, using a machine-readablerepresentation. Ontologies can be used to organize taxonomies andrelationships among metadata; this allows a user preference model to bebuilt onto higher-order semantic categories and concepts.

Information about ontologies can be found e.g. in the web site of the W3Consortium (currently at the address “http://www.w3.org”) and in thebook “WordNet: An Electronic Lexical Database”, edited by ChristianeFellbaum, MIT Press, May 1998.

A couple of examples are provided in the following relating to metadataderived through ontologies.

A simple ontology for time is the classification of time values in“day-time” and “night-time”; a simple ontology for date is theclassification of date values in “working-day” and “weekend-day”. Timevalue metadata and date value metadata are not meaningful in theconstruction of a user preference model; “day-time/night-time” and“working-day/weekend-day” metadata can be much more effective. Forexample, an user preference model can state that a user enjoys theinteraction with content items associated to a certain category duringweekend-days nights and dislikes them during working-days both duringthe day and during the night.

Ontologies are particularly suitable for generating categoricalmetadata. Let's take for example three text items A, B and C, eachcontaining a news article. Article A talks about computer models ofhuman lungs, and contains (among many others) the metadata words“computer” and “asthma”. Article B talks about robot-assisted surgery,and contains the metadata words “software” and “surgeon”. Article Ctalks about Internet, and contains the metadata word “website”. Linkingthe metadata words to a lexical ontology, articles A and B could beaugmented, for instance, by the abstract category “medicine”, that canbe added to the metadata of both articles. Analogously, the metadata ofall three articles A, B and C could be augmented by the abstractcategories “computer science” and “technology”. Therefore, the interestof a user in these articles can be related to abstract categories ratherthen to the single words.

Metadata Derivation

An important aspect of the present invention is the generation (i.e.,derivation) of metadata. This is achieved on the basis of derivationrules.

A derivation rule corresponds to an algorithm applied to content itemsand/or to context information and/or to metadata, the data on which thederivation rule is applied being generally referred to as source.

A derivation rule specifies the algorithm to be applied, by reference tothe plug-in module implementing it, and the sources to be processed.

In particular, the following types of sources may be provided:

-   -   content components;    -   authored content metadata;    -   raw context metadata;    -   derived metadata (i.e. obtained from other derivation rules);    -   extended analysis.

A derivation rule can be described for example by means of an XML[Extensible Markup Language] document that specifies the aforementionedelements (i.e. the module implementing the algorithm to be used as wellas the required parameters, and the input sources to be used).

Some examples of derivation rules are given in the following.

The first set of examples is related to a news browser application. Inthis application, the content items are news articles. Each content itemcontains two content components: the title of the news article and thebody, both in text form. The content of each content item is associatedto the article's date, category, source and author's name as authoredmetadata; specifically, this metadata are comprised in the content item.

A first metadata derivation rule is defined as follows:

<DerivationRule ID=”Body_BagOfWords”> <Module name=”BagOfWords”><Parameter name=”language”>Italian</Parameter> </Module> <Sourcetype=”ContentComponent”>news.body</Source> <Destinationtype=”DerivedMetadata”>bodyBow</Destination> </DerivationRule>

This rule prescribes to get the textual data constituting the body ofthe news article as input (Source) and to generate its “bag of words”representation as an output (Destination), i.e. the list of the wordsoccurring in the body together with the number of occurrences of eachword, by executing the algorithm provided by the plug-in module named“BagOfWords”.

A second metadata derivation rule is defined as follows:

<DerivationRule ID=”Body_WordOntology”> <Module name=”WordOntology”><Parameter name=”ontology”>LexicalOntology1</Parameter> <Parametername=”language”>Italian</Parameter> <Parametername=”hypernyms”>2</Parameter> <Parametername=”topsemanticlevel”>true</Parameter> <Parametername=”minoccur”>3</Parameter> </Module> <Sourcetype=”ContentComponent”>news.body</Source> <Destinationtype=”DerivedMetadata”>bodyWordOnto</Destination> </DerivationRule>

This rule prescribes to get the textual data constituting the body ofthe news article as input and to generate a related set of concept wordsas output, by executing the algorithm provided by the plug-in modulenamed “WordOntology”.

This rule executes the following steps:

-   -   generating a “bag of words” from the source text    -   matching each word from the obtained “bag of words” against        “LexicalOntology1” lexical ontology. For each matching word, it        extracts:        -   the hypernyms associated with the word, two levels above            (hypernyms=2). A lexical ontology like LexicalOntology1 is            organized as a tree. Moving from the tree's leaves towards            the root, means moving from words of specific meaning            towards words representing more abstract concepts. Given a            word, the words located above that word in the tree are            called “hypernyms” (e.g. “doctor”—“person”—“living being”;            “doctor”—“professional”—“worker”). Actually there can be            more than just a tree in ontology:        -   the top-level semantic categories associated with the word            (topsemanticlevel=“true”) (e.g. the word “doctor” belongs to            the top-level semantic category “Medicine”).    -   counting the occurrences of the extracted concepts (hypernyms        and top-level semantic categories), keeping only the concepts        with at least three occurrences (minoccur=3). This is done in        order to limit the number of resulting concepts, keeping only        the most relevant ones.

The overall result is a semantic representation of the text body of thenews article, in the form of a set of concept words from the lexicalontology, each with the number of its occurrences.

The following third and fourth derivation rules are similar respectivelyto the above first and second derivation rules, but their source is thetitle of the news item instead of the body. It should be noted, however,that the metadata generated from the title will form a separate metadataset from the metadata generated from the body, as declared in the“Destination tag”.

<DerivationRule ID=”Title_BagOfWords”> <Module name=”BagOfWords”><Parameter name=”language”>Italian</Parameter> </Module> <Sourcetype=”ContentComponent”>news.title</Source> <Destinationtype=”DerivedMetadata”>titleBow</Destination> </DerivationRule><DerivationRule ID=”Title_WordOntology”> <Module name=”WordOntology”><Parameter name=”ontology”>LexicalOntology1</Parameter> <Parametername=”language”>Italian</Parameter> <Parametername=”hypernyms”>2</Parameter> <Parametername=”topsemanticlevel”>true</Parameter> <Parametername=”minoccur”>3</Parameter> <Module> <Sourcetype=”ContentComponent”>news.title</Source> <Destinationtype=”DerivedMetadata”>titleWordOnto</Destination> </DerivationRule>

The following rule generates a numerical metadata corresponding to thenews body length, expressed as the count of the words in it, byexecuting the algorithm provided by the plug-in module “TextMetrics”.

<DerivationRule ID=”Body_Length”> <Module name=”TextMetrics”> <Parametername=”metric”>textLength</Parameter> <Parametername=”unit”>word</Parameter> </Module> <Sourcetype=”ContentComponent”>news.body</Source> <Destinationtype=”DerivedMetadata”>bodyTextLength</Destination> </DerivationRule>

The following rule generates a numerical metadata corresponding to theaverage sentence length in the news article body.

<DerivationRule ID=”Body_AvgSenLen”> <Module name=”TextMetrics”<Parameter name=”metric”>AvgSenLen</Parameter> </Module> <Sourcetype=”ContentComponent”>news.body</Source> <Destinationtype=”DerivedMetadata”>avgSenLen</Destination> </DerivationRule>

The following rule generates a numerical metadata expressing theestimated easiness of reading the news article body.

<DerivationRule ID=”Body_ReadabilityIndex”> <Module name=”TextMetrics”><Parameter name=”metric”>ReadabilityIndex</Parameter> </Module> <Sourcetype=”ContentComponent”>news.body</Source> <Destinationtype=”DerivedMetadata”>bodyTextLength</Destination> </DerivationRule>

The following three rules generate three numerical metadata,corresponding to the number of occurrences in the news body text ofdates, figures (numbers, percentages, prices) and people names.

<DerivationRule ID=”Body_Dates”> <Module name=”TextMetrics”> <Parametername=”metric”>Dates</Parameter> </Module> <Sourcetype=”ContentComponent”>news.body<Source/> <Destinationtype=”DerivedMetadata”>bodyDates</Destination> </DerivationRule><DerivationRule ID=”Body_Numbers”> <Module name=”TextMetrics”><Parameter name=”metric”>Figures</Parameter> </Module> <Sourcetype=”ContentComponent”>news.body</Source> <Destinationtype=”DerivedMetadata”>bodyFigures</Destination> </DerivationRule><DerivationRule ID=”Body_PeopleNames”> <Module name=”TextMetrics”><Parameter name=”metric”>PeopleNames</Parameter> </Module> <Sourcetype=”ContentComponent”>news.body</Source> <Destinationtype=”DerivedMetadata”>bodyPNames</Destination> </DerivationRule>

The second set of examples is related to a music catalogue browserapplication. In this application the items are the musical pieces. Eachitem contains a single content component: the audio file encoding themusical piece (e.g. in mp3 format). Each item also contains the title,date, genre, performers' names and authors' names of the musical asauthored metadata.

Four metadata derivation rules of the second set are defined as follows:

<DerivationRule ID=”MusicBeatSpeed”> <Modulename=”AdvancedSpectralAnalysis”> <Parametername=”metric”>BeatSpeed</Parameter> </Module> <Sourcetype=”ContentComponent”>music.audiofile</Source> <Destinationtype=”DerivedMetadata”>BeatSpeed</Destination> </DerivationRule><DerivationRule ID=”MusicVocalImpact”> <Modulename=”AdvancedSpectralAnalysis”> <Parametername=”metric”>VocalImpact</Parameter> </Module> <Sourcetype=”ContentComponent”>music.audiofile</Source> <Destinationtype=”DerivedMetadata”>VocalImpact</Destination> </DerivationRule><DerivationRule ID=”MusicSoundBrigthness”> <Modulename=”AdvancedSpectralAnalysis”> <Parametername=”metric”>SoundBrigthness</Parameter> </Module> <Sourcetype=”ContentComponent”>music.audiofile</Source> <Destinationtype=”DerivedMetadata”>SoundBrigthness</Destination> </DerivationRule><DerivationRule ID=”MusicLoudnessDynamic”> <Modulename=”AdvancedSpectralAnalysis”> <Parametername=”metric”>LoudnessDynamic</Parameter> </Module> <Sourcetype=”ContentComponent”>music.audiofile</Source> <Destinationtype=”DerivedMetadata”>LoudnessDynamic</Destination> </DerivationRule>

The rules above generate four numerical metadata expressing respectivelythe “beat speed” (i.e. a quantitative measure of the rhythmical featuresof the musical piece), the “vocal impact” (i.e. the weight, in themusical piece, of the human voice component with respect to theinstrumental component), the “sound brightness” (i.e. a quantitativemeasure of the sound's brilliance) and the “loudness dynamic” (i.e. aquantitative measure of the variation over time of the sound'sloudness). This can be done through spectral analysis techniques appliedto the audio signal encoded in the audio file, by executing thealgorithm provided by the plug-in module named“AdvancedSpectraLAnalysis”.

The following rule generates a text label metadata expressing the decadeof the musical piece, starting from the piece's year. This is done by adiscretization technique, i.e. collapsing ranges of numerical metadata(in this case, the year) in a synthetic description (the decade),provided by the plug-in module named “NumericalDiscretizer”.

<DerivationRule ID=”MusicDecade”> <Module name=”NumericalDiscretizer”><Parameter name=”intervals”> -1960=old, 1961-1970=sixties,1971-1980=seventies, 1981-1990=eighties, 1991-2000=nineties,2001-=recent </Parameter> <Module> <Sourcetype=”AuthoredMetadata”>AuthMetadata.year</Source> <Destinationtype=”DerivedMetadata”>Decade</Destination> </DerivationRule>

The following rule generates a numerical metadata expressing a roughestimate of the popularity of the music's main performer. This isobtained submitting to a web search engine, provided by the plug-inmodule named “SearchEngineQuery”, the main performer's name, and takingthe estimated number of hits (web pages containing the name) as result.

<DerivationRule ID=”MusicPerformerPopularity”> <Modulename=”SearchEngineQuery” <Parameter name=”searchengine”>yyy</Parameter><Parameter name=”metric”>EstimatedHitsNumber</Parameter> </Module><Source type=”AuthoredMetadate”>AuthMetadata.MainPerformerName</Source><Destination type=”DerivedMetadata”>PPopularity</Destination></DerivationRule>

Derivation Rules Using Extended Analysis Sources

The extended analysis provides a special kind of source that can beleveraged by the derivation rules. This type of source specifies theapplication of analysis procedures over a whole subset of the itemscontained in a repository (even all items), in order to obtain anoverall analysis of the domain's structure. Put in another way,derivation rules using extended analysis are not limited to extractmetadata using only the information contained in a single item, but cangenerate, for each item, new metadata that takes into account theoverall structure of the domain of the items themselves. The domain isthe field of application of the derivation technique, i.e., the field ofproviding to the user personalized selected content.

The algorithms actually performing the analysis specified by extendedanalysis can be executed by a dedicated software module or by a generalpurpose metadata generator module; if a dedicated module is provided,this can be a “plug-in” module invoked by the general purpose metadatagenerator module when needed.

The following is a derivation rule based on extended analysis; itoperates on the music catalogue application mentioned above.

<DerivationRule name=”NearestAudioClusters”> <Modulename=”NearestClusters”> <Parameter name=”numclusters”>1</Parameter><Parameter name=”distancemetric”>euclidean</Parameter> </Module> <Sourcetype=”DerivedMetadata”>VocalImpact</Source> <Sourcetype=”DerivedMetadata”>BeatSpeed</Source> <Sourcetype=”DerivedMetadata”>SoundBrightness</Source> <Sourcetype=”DerivedMetadata”>LoudnessDynamic</Source> <Sourcetype=”ExtendedAnalysis”> <ExtendedAnalysisID=”SpectralFeatureClusterAnalysis”> <ContentRangetype=”query”>All</ContentRange> <Module name=”NumericalClusterAnalysis”><Parameter name=”method“>Ward</Parameter> <Parametername=”maxclusters”>10</Parameter> </Module> <Sourcetype=”derivedMetadata”>VocalImpact</Source> <Sourcetype=”derivedMetadata”>BeatSpeed</Source> <Sourcetype=”derivedMetadata”>SoundBrightness</Source> <Sourcetype=”derivedMetadata”>LoudnessDynamic</Source> </ExtendedAnalysis></Source> <Destinationtype=”DerivedMetadata”>NearestAudioCluster</Destination></DerivationRule>

The analysis specified as an extended analysis in the above derivationrule takes as an input four derived numerical metadata obtained throughthe application of the derivation rules set out in the previousexamples. These four numerical values describe four relevant audiofeatures of each musical piece.

The extended analysis specifies to perform a “Cluster Analysis” over allthe items (i.e. the musical pieces) contained in the content repository.Cluster analysis is a known statistical technique used to identifygroups (i.e. clusters) of items characterized by similar values, giventhat these groups represent interesting regularities in the item'sdomain. In this case of a musical domain, the resulting clusters couldgroup the musical pieces sharing a similar audio appearance, e.g.closeness to the same musical genre.

The algorithm actually executing the analysis is provided by the plug-inmodule named “NumericalClusterAnalysis”.

In particular, the extended analysis in this example specifies thecluster analysis method to be applied (Ward's method), the range ofitems on which the analysis has to be applied (all the items in therepository) and the maximum number of clusters to be extracted (10clusters).

This analysis allows the derivation rule to generate, for a given item(musical piece), a metadata indicating which cluster is closest to it.This new metadata identifies the location of the musical piece in the“musical landscape” represented by the clusters.

Machine Learning Methods

Machine learning methods allow a computer system to perform automatic(i.e., through software programs) learning from a set of factual data,belonging to a specific application field (i.e., domain). Given such adata set, machine learning methods are able to extract patterns andrelationships from the data themselves.

Learned patterns and relationships are encoded by machine learningmethods in a formal, quantitative model, which can take different formsdepending on the machine learning technique used. Examples of modelforms include logic rules, mathematical equations and mathematicalgraphs.

A goal of machine learning methods is that of a better understanding andquantification of patterns within data and relationships between data inorder to obtain a model as a representation for the data.

Most machine learning methods use the feature vector representation. Incase those methods are applied to build a predictive model of userpreferences related to the provided content, each feature vector isassociated to a content item comprises:

-   -   independent features, each feature corresponding to a derived or        a pre-assigned metadata associated to that content item and        preferably also to a raw or derived context metadata, and    -   target feature(s), represented a score provided by the user as a        feedback (either explicit or implicit) related to the provided        content. For example, the feedback is represented by a numerical        value from 1 to 10 where a high value corresponds to a positive        feedback.

Each instance of the data set is then represented as a vector offeatures. In case of a single target feature, the vector representing aninstance is “n+1” dimensional and takes the following form:

-   -   <feature 1, feature, 2 . . . , feature n, target feature>

The feature vector model is a formal representation of the domain dataand is suitable for most machine learning methods.

An extensive discussion about machine learning methods and theirapplications can be found in Tom Mitchell, “Machine Learning”,McGraw-Hill, 1997.

The data set (to be processed by machine learning methods in order tobuild the predictive models of the user profiles) preferably comprisescontent metadata (authored and derived) and preferably context metadata(raw and derived), as independent features. The user feedback is thetarget feature. The goal of the machine learning methods is to find amodel (referred to as user model or predictive model) predicting userpreferences, i.e. a machine learning model expressing the relationshipsbetween the metadata and the user feedback. The predictive model thusobtained can then be used for estimating the user's evaluation withrespect to new content items (providing new metadata) when they becomeavailable.

An instance of data set represented by a feature vector corresponds to asingle interaction event wherein a user expresses his preference withrespect to one content item, and takes the following form:

-   -   <content metadata 1, . . . , content metadata m, context        metadata 1, . . . , context metadata p, user vote>        wherein m+p=n.

If the user expresses preferences to a plurality of content items, aplurality of feature vectors are created, which can be formallyrepresented by a matrix (n+1)×q, where q is the number of interactionevents. For instance, if the user has expressed his/her preference to 10content items, a matrix (n+1)×10 is created. The selected machinelearning algorithm is then applied to the matrix.

Several well known machine learning methods can be useful for thispurpose, including decision trees, association rules, neural networksand Bayesian methods, as well as methods specifically designed for thetask of building a user preference model.

An example of the use of a machine learning method in building a userprofile is given in the following, with reference to the music catalogueapplication mentioned before.

In this simple, example the content item (a musical piece) isrepresented by two pieces of metadata:

-   -   MusicGenre, the musical genre (provided as authored metadata),    -   MusicBeatSpeed, the beats per minute of the musical piece        (provided as derived metadata by the application of the        “MusicBeatSpeed” derivation rule). The interaction context is        represented by the following (derived) context metadata:    -   “Time”, that can have either one of the two values “day” or        “night”, and which tells if the interaction of the user with the        content item has happened during the day or the night (provided        as derived metadata, by the application of a derivation rule        based on a simple temporal ontology).

The user preference is given by the following feature:

-   -   “UserVote”, which can have either one of the two values “like”        or “dislike”, and which tells if the user has provided a        positive or a negative score to the musical piece.

Thus, with reference to the usual machine learning representation of thedomain seen before, a single event of a user expressing his preferencewith respect to a musical piece takes the following vector form:

-   -   <MusicGenre, MusicBeatSpeed, Time, UserVote>

The following data set of user/item interactions is given by example:

ID Genre MusicBeatSpeed Time UserVote 1 Rock 128 day Like 2 Dance 130day Like 3 Dance 125 night dislike 4 dance 130 night dislike 5 rock 130night dislike 6 classical 55 day dislike 7 classical 60 day dislike 8dance 70 night Like 9 jazz 65 night Like 10 classical 75 night Like 11jazz 60 night Like 12 rock 125 day dislike 13 dance 135 night Like

The application of a decision tree machine learning method to the abovedata set generates a user preference model consisting in the followingrules:

-   -   IF Time=“day” AND MusicBeatSpeed>=125 THEN UserVote=“like”    -   IF Time=“night” AND MusicBeatSpeed<=75 THEN UserVote=“like”

It is to be noted that the above user preference rules to be used forgenerating predicted scores are not the derivation rules to be used forgenerating derived metadata.

The simple predicting model above, expressed in terms of user preferencerules, tells that this specific user likes very fast music(MusicBeatSpeed>=125) during the day, while prefers more calm music(MusicBeatSpeed<=75) during the night.

It is to be noted that this model holds for the vast majority (12 out of13) of the cases of the above data set, but not for all.

Detailed Description of an Embodiment

In the following a detailed description will be provided of anadvantageous embodiment of the present invention with specific referenceto the block diagram of FIG. 1 (service application); in this diagramtwo symbols are used: a square shape representing a software module anda cylinder shape representing a repository.

This embodiment refers to a service provider that offers a content-basedservice to users. This service can be either of PULL type or pf PUSHtype, or both.

The content-based service provides selected content items to the users.A content item may be delivered by the service provider directly orindirectly to the user, for example by providing an address (e.g. anInternet address) where the content item is located or can be accessed.Content items are generally provided directly by content providersthrough a communication network, such as a packet-data network (e.g.,Internet) or a mobile telephone network (e.g. UTMTS network).

The content-based service provides for building and maintaining userprofiles in order to offer a better selection of content items.

The present content-based service can be divided into two activities:

-   -   processing content items,    -   processing user profiles.

Processing Content Items

The activity of processing content items comprises:

-   -   receiving requests from a user,    -   selecting content items on the basis of the requests of the        user,    -   preferably, formatting and presenting the selected content items        to the user according to the presentation profile of the user        (i.e., personalization of the presentation of content according        to the service application logic), and    -   providing the selected content items to the user.

In PULL mode, as soon as a request by a user is received, the serviceprovider identifies a set of content items based on this request andthen it carries out the above steps on this set of content items; thismeans that some selected content items are usually provided as repliesto the user shortly after a request from the user.

In PUSH mode, the service provider receives requests from a user andstore them without immediate reply and typically without immediateprocessing; then two possibilities exists. According to the firstpossibility, periodically, the service provider identifies all newlypublished content items and then it carries out the above steps on allnewly published content items. According to the second possibility, theservice provider identify one content item as soon as it is publishedand then it carries out the above steps on the newly published contentitem. In PUSH mode, the provision of content items may be carried out intwo steps: at first the service provider simply notifies the user thatsome content items of interest to him are available and then the serviceprovider sends (directly or indirectly) these content items as soon asthe user expresses his wish to receive them; the user might also expresshis wish to receive only a part of these content items.

Every input by a user is received and processed by a service front endmodule (SFEM), e.g., a PC or mobile terminal. Requests by a user aresent to a service application logic module (SALM), which embeds thelogic specific to the content-based service to be provided.Additionally, module SFEM sends to module SALM raw context metadataassociated to the user requests (e.g. date and time, user location,etc.).

When module SALM receives a user request from module SFEM, it generatesa corresponding content query (STEP 201 in FIG. 2). Based on thiscontent query (deriving from the user request) and its service logic,module SALM identifies a first set of content items (STEP 202 in FIG. 2)in a content items repository CIR. For instance, following the contentquery, the service application logic identifies in repository CIR afirst set of content items related to movies and TV serials.

Repository CIR stores content items as well as content metadatapre-assigned to the stored content items (typically authored metadata).As explained more in detail in the following section, repository CIR canstore also the derived content metadata associated to the content items,as resulting from previous metadata generation e.g. triggered fromprevious content queries.

Were the user feedback to be applied to said first set of content items,usability concerns would raise as the amount of inputs that the serviceapplication can ask to the user, such as explicit preferences (“I like”or “I do not like”), is generally limited. Moreover, in many cases aprecise formalization of such inputs is unfeasible.

In order to avoid poor filtering and with the final goal of retainingaccurate content for the user, module SALM asks a match-maker module(MMM) to produce a ranking of the content items (STEP 203 in FIG. 2) ofsaid first set based on a user profile. The ranking received by moduleSALM from module MMM can then be used by module SALM to filter out lowscored content items, to choose the best scored content items and toreorder the retained content items. This can be done according to knownmethods.

Therefore, module SALM selects a second set of content items within theidentified first set of content items by filtering the content items(STEP 204 in FIG. 2) of this first set through the help of a MMM. ModuleMMM is the key element of the filtering activity and is responsible fortaking into account the profile of the user (or user profile), as itwill be explained in the following. The second set of content itemsobtained by the filtering activity of the MMM is preferably a subset ofthe first set of content items, but it is not excluded that the secondset of content items contains all content items of the first set,although ordered according to a ranking preferences so that the user canview the items according to the rank associated to them.

Module SALM aggregates, transforms and formats the content items in thesubset for presenting them to the user (STEP 205 in FIG. 2); instead ofpresenting the selected content items, it would be possible to simplynotify them to the user. Presentation and/or notification are carriedout by module SFEM.

According to a preferred embodiment of the present invention, thearchitecture of FIG. 1 comprises:

-   -   a user profiles repository (UPR),    -   a metadata generator module (MGM).

Module MGM provides a set of derivation rules in order to generatederived metadata (content and/or context). Derivation rules are based onderivation algorithms. In the embodiment of FIG. 1, these algorithms areexternal to module MGM and are provided by a derivation algorithm module(DAM) that is realized through the “plug-in” technology; this allows tohave locally as well as remotely stored algorithms that are called orinvoked by module MGM.

Module MMM retrieves:

-   -   the user profile of the current user from repository UPR; and    -   for each content item in the said first set of content items,        the content metadata, both pre-assigned (typically authored) and        possibly derived (as a result of previous interaction events),        associated thereto from repository CIR.

Additionally, module MMM receives context metadata associated to thecurrent context from other modules. Specifically, raw context metadataare received from module SFEM through module SALM, and derived contextmetadata are received from module MGM.

Module MMM sends raw context metadata to module MGM, asking to generatederived metadata (starting from at least the raw context metadata), andreceives the generated derived context metadata. In this way, at leastsome context metadata may be derived on the fly, i.e. during theinteraction with the user.

Then, module MMM applies the user profile to the content metadata (bothpre-assigned and derived) associated to each content item within theidentified first set of content items and preferably to the contextmetadata (both raw and derived) associated to the current context. Inthis way, module MMM matches the first set of content items against theuser profile. In the present embodiment, the user profile comprises atleast a predictive model (generated preferably by machine learningmethods). The predictive model is applied to each content item of saidfirst set and generates a predicted vote for each content item. The setof predicted votes associated to the first set of content items is usedby the module MMM to produce a ranking of the content items of the firstset. Said ranking will be provided to module SALM, which will define asecond set of content items formed by the ranked set of content items orby a sub-set of the first set of content items selected as a result ofsaid ranking (e.g., comprising only the best ranked content items of thefirst set).

Preferably, the present embodiment additionally provides for:

-   -   a user interaction recorder module IRM, and    -   an interaction history repository IHR.

The interaction history may take the form of a sequence of records eachof them containing several pieces of information relating to e.g. userrequests (or the corresponding query), system replies, context,metadata, user feedback. Preferably, a synthetic format is used (forexample links or indexes instead of physical items). Typically, eachrecord of the interaction history correspond to a different interactionevent.

Module IRM has the task to update the interaction history (STEP 206 inFIG. 2). To this regard, module IRM directly records the user requests(received from module SFEM) into repository IHR. Additionally, moduleSALM records, through module IRM, its replies (in terms of contentitems) to the user requests into repository IHR. Module SALM may alsorecord, through module IRM, into repository IHR the predicted votesand/or all or part of the (content and/or context) metadata used forreplying to the user.

Advantageously, in order to save storage space in repository IHR, onlyone type of metadata, namely raw context metadata, are stored intorepository IHR (as, at any time, the other metadata can be retrievedfrom repository CIR or generated by module MGM); this can be carried outby module IRM that receives such metadata directly from module SFEM.

The service application may ask the user to provide his feedbackrelating to the content items provided in reply to his requests; moduleSALM may use module SFEM also for this purpose. A typical feedback fromthe user is represented by votes (that can be directly compared withpredicted votes). In this case, module SALM may store, through moduleIRM, such explicit feedback into repository IHR. Advantageously, theservice application logic is designed so as to leave to the user thefreedom of providing explicit feedback or not.

Alternatively, when the service application logic does not provide forexplicit feedback from the user, the user's behaviour can be monitoredin order to derive implicit feedback therefrom (this can be carried oute.g. by module SFEM); for example, a vote may be associated to the timespent by a user in reading a news item provided by a news service. Inthis case, such implicit feedback may be recorded into repository IHR bymodule IRM.

The processing and recording of user's feedback (STEP 207 of FIG. 2), beit explicit and/or implicit, may be carried out after every reply or atthe end of a service interaction session.

It is to be noted that, if a content item comprises a plurality ofcontent components, feedback may relate to the content item as a whole;advantageously, feedback may relate alternatively or additionally toeach component of the content item. For example, a user may expressvotes relating to a movie globally or express separate votes for itsvideo component and its audio component. In this case, for example,separate votes are recorded as interaction history.

Processing User Profiles

The activity of processing user profiles consists in creating (building)and maintaining (e.g., updating) the user profiles. In the architectureof FIG. 1, user profiles are stored in repository UPR and a user profilebuilder module PBM is provided for carrying out the activity ofprocessing user profiles.

This activity may be advantageously carried out “off line”, for exampleduring the night when the number of user interactions is much smaller.

According to the present embodiment, module PBM performs the followingsteps:

-   -   retrieving from repository IHR an interaction history (STEP 301        in FIG. 3) of a user (either the complete interaction history or        a partial interaction history corresponding to the time frame        from the last user profile update to the present time). The        interaction history comprises at least an event, generally a set        of events. An event typically comprises a content query        (corresponding to the user request), raw context metadata and        the set of selected content items provided following the query        and preferably the user feedback;    -   selecting an appropriate machine learning algorithm (STEP 302 in        FIG. 3) adapted to build a predictive model of the user        preferences based on the information contained in the        interaction history;    -   for each interaction event, E_(i), in the interaction history,        PBM module generates a feature vector, typically a single        vector, V_(i), (STEP 303 and STEP 304 and STEP 305 in FIG. 3) of        “n+1” dimensions, where n is the sum of the number of the        features related to the content metadata (pre-assigned and        derived) and the number of the features related to the context        metadata (raw and derived);    -   applying the selected machine learning algorithm to the feature        vectors (V_(i), V_(j), V_(k) . . . associated to events E_(i),        E_(ja), E_(k) . . . ) generated in the previous step in order to        build a new predictive model to be incorporated in the user        profile (step 306). Machine learning algorithms can be able to        process only a single feature vector at a time, or they can        manage to process a set of feature vectors at a time (as in the        example set out before where the model is generated by        processing thirteen vectors corresponding to thirteen        interaction events);    -   (preferably) validating the performance of the newly built        predictive model against a predetermined acceptance criterion        (typically, a “better than before” type criterion) as a        condition for the user profile update (step 307). For example, a        known and effective validating technique is the “ten-fold        cross-validation” that is based on a ten different partitioning        of the events (e.g. 90% of events for building the model and 10%        of events for validating the model). Depending on the particular        implementation, validation can be integrated within the machine        learning methods, and    -   updating the user profile by substituting the previous model        with the new model in repository UPR (step 308).

The generation of a feature vector, V_(i), related to the interactionevent E_(i) may be carried out according to the following steps:

-   -   retrieving raw content metadata;    -   sending raw context metadata (recorded in the interaction        history) to module MGM, asking to generate derived context        metadata from the raw context metadata; coding the raw context        metadata and the derived context metadata (obtained from module        MGM) into a context feature vector, V_(ix), of dimension p (the        above two steps are indicated as a single step 303 in the flow        chart of FIG. 3);    -   retrieving content metadata (both authored and derived) from        repository CIR;    -   coding the content metadata into a content feature vector,        V_(ic), of dimension m, where m+p=n, (the above two steps are        indicated as a single step 304 in FIG. 3);    -   joining the content feature vector, V_(ic), to the context        feature vector, V_(ix), generated in the previous steps and        adding user vote as target feature, t, into a single feature        vector of n+1 dimension, V_(i)=<V_(ix), V_(ic), t> (STEP 305 in        FIG. 3);    -   identifying a machine learning method algorithm, and    -   applying said machine learning method algorithm to said feature        vector, V_(i), in order to obtain a predictive model of user        preferences.

It is to be noted that the above listed steps may be used even whenthere is no previously existing predictive model, i.e. they may used notonly for updating a user profile but also for building a new userprofile. In that case, a fictious user model would be used where forinstance to any content, a positive feedback is assumed.

If it is provided that the user expresses feedback on each component ofa multi-component content item, module PBM should take into account suchmore detailed feedback.

The above description assumes that a user has only one user profile.Anyway, the present invention may be extended to the case when a userhas more user profiles and can switch between them. This may beadvantageous e.g. when context metadata is insufficient for an accuratedescription of the interaction context; for example, it is difficult fora terminal (i.e. an interaction device) to automatically determine if auser is at home or in the office (unless the user sets an environment'soption in the interaction device).

The choice of the user profile can be made at the beginning of aninteraction session, usually comprising a number of requests andcorresponding replies with implicit or explicit feedback.

Alternatively, the choice of the user profile can be made on the fly atthe very moment of the feedback action.

For example, let us assume that a user, in a movie browsing application,finds a horror movie he likes. He gives a high first vote to that item,specifying that the first vote is referred to the “Personal” profile.Since horror movies are not good for his children, he gives also a verylow second vote to the same movie, this time specifying that the secondvote is referred to the “Family” profile. Alternatively, the user canset one of its profiles as the current profile; votes given by the userwill be referred to the profile set. When the user asks for ranking orrecommendations about movies, he will need to specify the profile underwhich the recommendations are to be given.

In an embodiment wherein multiple user profiles are provided, the moduleof FIG. 1 needs to take into account of this multiplicity. Module IRMneeds to record into repository IHR also information about the userprofile. Module PBM needs to choose the correct user profile to beupdated. Module MMM needs to select and use the correct user profile forproducing ranking of content items.

The invention claimed is:
 1. A method of providing selected contentitems to a user, comprising the steps of: A) identifying a first set ofcontent items based on a request by the user, wherein first contentmetadata are pre-assigned to the content items of said first set; B)automatically generating second content metadata for the content itemsof said first set on a basis of at least a first derivation rule, saidderivation rule corresponding to an algorithm applied to at least thecontent items of said first set; C) associating said second contentmetadata to the content items of said first set; and D) providing asecond set of selected content items from said first set on a basis ofsaid first content metadata and said second content metadata, whereinstep D) is carried out based on a user profile of the user comprising apredictive model, and wherein the method further comprises building thepredictive model based on a feature vector and metadata, wherein themetadata comprises textual components and numerical components, andwherein the feature vector comprises metadata including at least onetextual component from at least one of the first content metadata or thesecond content metadata, wherein in step D) a ranking to each of thecontent items of said first set is provided on a basis of said secondcontent metadata and of said predictive model so as to define saidsecond set of content items based on said ranking.
 2. The methodaccording to claim 1, wherein said algorithm is to be applied also to atleast some of said first content metadata.
 3. The method according toclaim 1, wherein step B) is carried out also on the basis of firstcontext metadata relating to an interaction context of said selectedcontent items in order to automatically generate second contextmetadata.
 4. The method according to claim 1, further comprisingderiving said second content metadata from an algorithm applied to aplurality of content items.
 5. The method according to claim 1, whereinsaid ranking to each of the content items of said first set provided onthe basis of said second content metadata and of said second contextmetadata and of said user profile, and to select said second set ofcontent items based on said ranking.
 6. The method according to claim 1,wherein said second set of content items is provided as replies tocorresponding requests by said user.
 7. The method according to claim 1,further comprising the step of associating a feedback from said user toat least a content item of said second set.
 8. The method according toclaim 7, further comprising the step of recording at least a contentitem of said second set and a feedback from said user related to said atleast a recorded content item.
 9. The method according to claim 8,comprising the step of recording at least part of said second contentmetadata used for selecting said at least a content item.
 10. The methodaccording to claims 8, comprising the step of recording said userrequest.
 11. The method according to claim 8, comprising the step ofbuilding or updating the predictive model of said user based at least onsaid recorded content item and user feedback.
 12. The method accordingto claims 11, comprising the step of recording at least part of saidsecond content metadata and the step of building or updating apredicative model of said user based at least on recorded second contentmetadata.
 13. The method according to claim 12, wherein said predictivemodel is built or updated through at least one machine learningalgorithm applied to at least said second content metadata.
 14. Themethod according to claim 13, wherein said machine learning algorithm isapplied also to at least some context metadata.
 15. The method accordingto claim 1, wherein the user profile comprises at least two predictivemodels.
 16. The method according to claim 1, wherein said second set ofcontent items is provided by a service provider.
 17. The methodaccording to claim 1, wherein said selected content items are providedthrough a telecommunication network.
 18. A non-transitory computerreadable medium encoded with a computer program product comprisingsoftware code portions, which when loaded into the memory of at leastone computer and executed by the at least one computer, perform themethod of claim
 1. 19. A method for providing a content-based servicecomprising: collecting content requests from a user; identifying a firstset of content items based on said content requests by the user, whereinfirst content metadata are pre-assigned to the content items of saidfirst set; automatically generating second content metadata for thecontent items of said first set on a basis of at least a firstderivation rule, said first derivation rule corresponding to analgorithm applied to at least the content items of said first set;associating said second content metadata to the content items of saidfirst set; and providing a second set of selected content items fromsaid first set on a basis of said first content metadata and said secondcontent metadata, wherein providing the second set is carried out basedon a user profile of the user comprising a predictive model, and whereinthe method further comprises building the predictive model based on afeature vector and metadata, wherein the metadata comprises textualcomponents and numerical components, and wherein the feature vectorcomprises metadata including at least one textual component from atleast one of the first content metadata or the second content metadata,wherein providing the second set comprises providing a ranking to eachof the content items of said first set on a basis of said second contentmetadata and of said predictive model so as to define said second set ofcontent items based on said ranking.